Dataset statistics
| Number of variables | 9 |
|---|---|
| Number of observations | 2000 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 730 |
| Duplicate rows (%) | 36.5% |
| Total size in memory | 140.8 KiB |
| Average record size in memory | 72.1 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 1 |
| Dataset has 730 (36.5%) duplicate rows | Duplicates |
Pregnancies is highly correlated with Age | High correlation |
Age is highly correlated with Pregnancies | High correlation |
Pregnancies is highly correlated with Age | High correlation |
SkinThickness is highly correlated with Insulin | High correlation |
Insulin is highly correlated with SkinThickness | High correlation |
Age is highly correlated with Pregnancies | High correlation |
BloodPressure is highly correlated with BMI | High correlation |
Insulin is highly correlated with DiabetesPedigreeFunction | High correlation |
BMI is highly correlated with BloodPressure and 1 other fields | High correlation |
SkinThickness is highly correlated with BMI | High correlation |
Pregnancies is highly correlated with Age | High correlation |
DiabetesPedigreeFunction is highly correlated with Insulin | High correlation |
Age is highly correlated with Pregnancies | High correlation |
Pregnancies has 301 (15.0%) zeros | Zeros |
BloodPressure has 90 (4.5%) zeros | Zeros |
SkinThickness has 573 (28.6%) zeros | Zeros |
Insulin has 956 (47.8%) zeros | Zeros |
BMI has 28 (1.4%) zeros | Zeros |
Reproduction
| Analysis started | 2021-06-20 14:05:37.412593 |
|---|---|
| Analysis finished | 2021-06-20 14:05:56.247683 |
| Duration | 18.84 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 17 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.7035 |
| Minimum | 0 |
|---|---|
| Maximum | 17 |
| Zeros | 301 |
| Zeros (%) | 15.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 3 |
| Q3 | 6 |
| 95-th percentile | 10 |
| Maximum | 17 |
| Range | 17 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 3.306063033 |
|---|---|
| Coefficient of variation (CV) | 0.8926861166 |
| Kurtosis | 0.409867576 |
| Mean | 3.7035 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.9823655943 |
| Sum | 7407 |
| Variance | 10.93005278 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=17)
| Value | Count | Frequency (%) |
| 1 | 356 | |
| 0 | 301 | |
| 2 | 284 | |
| 3 | 195 | |
| 4 | 191 | |
| 5 | 141 | 7.0% |
| 6 | 131 | 6.6% |
| 7 | 100 | 5.0% |
| 8 | 96 | 4.8% |
| 9 | 70 | 3.5% |
| Other values (7) | 135 | 6.8% |
| Value | Count | Frequency (%) |
| 0 | 301 | |
| 1 | 356 | |
| 2 | 284 | |
| 3 | 195 | |
| 4 | 191 | |
| 5 | 141 | 7.0% |
| 6 | 131 | 6.6% |
| 7 | 100 | 5.0% |
| 8 | 96 | 4.8% |
| 9 | 70 | 3.5% |
| Value | Count | Frequency (%) |
| 17 | 3 | 0.1% |
| 15 | 2 | 0.1% |
| 14 | 7 | 0.4% |
| 13 | 22 | 1.1% |
| 12 | 23 | 1.1% |
| 11 | 24 | 1.2% |
| 10 | 54 | |
| 9 | 70 | |
| 8 | 96 | |
| 7 | 100 |
Glucose
Real number (ℝ≥0)
| Distinct | 136 |
|---|---|
| Distinct (%) | 6.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 121.1825 |
| Minimum | 0 |
|---|---|
| Maximum | 199 |
| Zeros | 13 |
| Zeros (%) | 0.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 80 |
| Q1 | 99 |
| median | 117 |
| Q3 | 141 |
| 95-th percentile | 181 |
| Maximum | 199 |
| Range | 199 |
| Interquartile range (IQR) | 42 |
Descriptive statistics
| Standard deviation | 32.06863565 |
|---|---|
| Coefficient of variation (CV) | 0.2646309133 |
| Kurtosis | 0.5603705831 |
| Mean | 121.1825 |
| Median Absolute Deviation (MAD) | 20 |
| Skewness | 0.1588058725 |
| Sum | 242365 |
| Variance | 1028.397392 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 99 | 49 | 2.5% |
| 100 | 44 | 2.2% |
| 102 | 39 | 1.9% |
| 129 | 37 | 1.8% |
| 112 | 36 | 1.8% |
| 95 | 36 | 1.8% |
| 106 | 36 | 1.8% |
| 105 | 34 | 1.7% |
| 111 | 33 | 1.7% |
| 120 | 33 | 1.7% |
| Other values (126) | 1623 |
| Value | Count | Frequency (%) |
| 0 | 13 | |
| 44 | 2 | 0.1% |
| 56 | 3 | 0.1% |
| 57 | 5 | 0.2% |
| 61 | 3 | 0.1% |
| 62 | 2 | 0.1% |
| 65 | 3 | 0.1% |
| 67 | 2 | 0.1% |
| 68 | 7 | |
| 71 | 9 |
| Value | Count | Frequency (%) |
| 199 | 3 | 0.1% |
| 198 | 2 | 0.1% |
| 197 | 8 | |
| 196 | 5 | |
| 195 | 8 | |
| 194 | 10 | |
| 193 | 6 | |
| 191 | 2 | 0.1% |
| 190 | 3 | 0.1% |
| 189 | 8 |
| Distinct | 47 |
|---|---|
| Distinct (%) | 2.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 69.1455 |
| Minimum | 0 |
|---|---|
| Maximum | 122 |
| Zeros | 90 |
| Zeros (%) | 4.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 43.8 |
| Q1 | 63.5 |
| median | 72 |
| Q3 | 80 |
| 95-th percentile | 90 |
| Maximum | 122 |
| Range | 122 |
| Interquartile range (IQR) | 16.5 |
Descriptive statistics
| Standard deviation | 19.18831482 |
|---|---|
| Coefficient of variation (CV) | 0.2775063426 |
| Kurtosis | 5.32848981 |
| Mean | 69.1455 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | -1.854476017 |
| Sum | 138291 |
| Variance | 368.1914255 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=47)
| Value | Count | Frequency (%) |
| 74 | 145 | 7.2% |
| 70 | 144 | 7.2% |
| 78 | 128 | 6.4% |
| 68 | 125 | 6.2% |
| 64 | 120 | 6.0% |
| 72 | 118 | 5.9% |
| 80 | 98 | 4.9% |
| 62 | 94 | 4.7% |
| 76 | 93 | 4.7% |
| 60 | 92 | 4.6% |
| Other values (37) | 843 |
| Value | Count | Frequency (%) |
| 0 | 90 | |
| 24 | 2 | 0.1% |
| 30 | 3 | 0.1% |
| 38 | 3 | 0.1% |
| 40 | 2 | 0.1% |
| 44 | 11 | 0.5% |
| 46 | 6 | 0.3% |
| 48 | 13 | 0.7% |
| 50 | 31 | 1.6% |
| 52 | 29 | 1.5% |
| Value | Count | Frequency (%) |
| 122 | 3 | 0.1% |
| 114 | 3 | 0.1% |
| 110 | 7 | |
| 108 | 5 | |
| 106 | 9 | |
| 104 | 5 | |
| 102 | 3 | 0.1% |
| 100 | 9 | |
| 98 | 8 | |
| 96 | 8 |
| Distinct | 53 |
|---|---|
| Distinct (%) | 2.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 20.935 |
| Minimum | 0 |
|---|---|
| Maximum | 110 |
| Zeros | 573 |
| Zeros (%) | 28.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 23 |
| Q3 | 32 |
| 95-th percentile | 44.05 |
| Maximum | 110 |
| Range | 110 |
| Interquartile range (IQR) | 32 |
Descriptive statistics
| Standard deviation | 16.10324291 |
|---|---|
| Coefficient of variation (CV) | 0.7692019541 |
| Kurtosis | 0.1555797786 |
| Mean | 20.935 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | 0.2072281256 |
| Sum | 41870 |
| Variance | 259.3144322 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 573 | |
| 32 | 83 | 4.2% |
| 30 | 75 | 3.8% |
| 23 | 60 | 3.0% |
| 27 | 58 | 2.9% |
| 18 | 54 | 2.7% |
| 28 | 54 | 2.7% |
| 39 | 52 | 2.6% |
| 33 | 51 | 2.5% |
| 31 | 50 | 2.5% |
| Other values (43) | 890 |
| Value | Count | Frequency (%) |
| 0 | 573 | |
| 7 | 3 | 0.1% |
| 8 | 6 | 0.3% |
| 10 | 13 | 0.7% |
| 11 | 14 | 0.7% |
| 12 | 21 | 1.1% |
| 13 | 30 | 1.5% |
| 14 | 15 | 0.8% |
| 15 | 33 | 1.7% |
| 16 | 15 | 0.8% |
| Value | Count | Frequency (%) |
| 110 | 2 | 0.1% |
| 99 | 2 | 0.1% |
| 63 | 3 | |
| 60 | 2 | 0.1% |
| 59 | 2 | 0.1% |
| 56 | 3 | |
| 54 | 4 | |
| 52 | 4 | |
| 51 | 3 | |
| 50 | 7 |
| Distinct | 182 |
|---|---|
| Distinct (%) | 9.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 80.254 |
| Minimum | 0 |
|---|---|
| Maximum | 744 |
| Zeros | 956 |
| Zeros (%) | 47.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 40 |
| Q3 | 130 |
| 95-th percentile | 293 |
| Maximum | 744 |
| Range | 744 |
| Interquartile range (IQR) | 130 |
Descriptive statistics
| Standard deviation | 111.1805335 |
|---|---|
| Coefficient of variation (CV) | 1.385358157 |
| Kurtosis | 5.128261644 |
| Mean | 80.254 |
| Median Absolute Deviation (MAD) | 40 |
| Skewness | 1.996084356 |
| Sum | 160508 |
| Variance | 12361.11104 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 956 | |
| 105 | 31 | 1.6% |
| 140 | 24 | 1.2% |
| 180 | 23 | 1.1% |
| 130 | 22 | 1.1% |
| 120 | 21 | 1.1% |
| 100 | 20 | 1.0% |
| 135 | 17 | 0.9% |
| 94 | 17 | 0.9% |
| 76 | 17 | 0.9% |
| Other values (172) | 852 |
| Value | Count | Frequency (%) |
| 0 | 956 | |
| 14 | 3 | 0.1% |
| 15 | 3 | 0.1% |
| 16 | 3 | 0.1% |
| 18 | 5 | 0.2% |
| 22 | 3 | 0.1% |
| 23 | 4 | 0.2% |
| 25 | 2 | 0.1% |
| 29 | 3 | 0.1% |
| 32 | 2 | 0.1% |
| Value | Count | Frequency (%) |
| 744 | 2 | 0.1% |
| 680 | 2 | 0.1% |
| 600 | 2 | 0.1% |
| 579 | 4 | |
| 545 | 2 | 0.1% |
| 540 | 3 | |
| 510 | 3 | |
| 495 | 5 | |
| 485 | 3 | |
| 480 | 7 |
| Distinct | 247 |
|---|---|
| Distinct (%) | 12.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 32.193 |
| Minimum | 0 |
|---|---|
| Maximum | 80.6 |
| Zeros | 28 |
| Zeros (%) | 1.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 21.8 |
| Q1 | 27.375 |
| median | 32.3 |
| Q3 | 36.8 |
| 95-th percentile | 45.01 |
| Maximum | 80.6 |
| Range | 80.6 |
| Interquartile range (IQR) | 9.425 |
Descriptive statistics
| Standard deviation | 8.149900701 |
|---|---|
| Coefficient of variation (CV) | 0.2531575405 |
| Kurtosis | 4.131722134 |
| Mean | 32.193 |
| Median Absolute Deviation (MAD) | 4.7 |
| Skewness | -0.09045533681 |
| Sum | 64386 |
| Variance | 66.42088144 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 31.2 | 33 | 1.7% |
| 32 | 33 | 1.7% |
| 31.6 | 29 | 1.5% |
| 0 | 28 | 1.4% |
| 33.3 | 27 | 1.4% |
| 32.8 | 25 | 1.2% |
| 32.4 | 25 | 1.2% |
| 32.9 | 24 | 1.2% |
| 30.8 | 24 | 1.2% |
| 29.7 | 22 | 1.1% |
| Other values (237) | 1730 |
| Value | Count | Frequency (%) |
| 0 | 28 | |
| 18.2 | 8 | 0.4% |
| 18.4 | 2 | 0.1% |
| 19.1 | 2 | 0.1% |
| 19.3 | 3 | 0.1% |
| 19.4 | 2 | 0.1% |
| 19.5 | 6 | 0.3% |
| 19.6 | 6 | 0.3% |
| 20 | 3 | 0.1% |
| 20.1 | 5 | 0.2% |
| Value | Count | Frequency (%) |
| 80.6 | 2 | |
| 67.1 | 3 | |
| 64.4 | 2 | |
| 59.4 | 3 | |
| 57.3 | 3 | |
| 55 | 3 | |
| 53.2 | 3 | |
| 52.9 | 3 | |
| 52.7 | 2 | |
| 52.3 | 4 |
| Distinct | 505 |
|---|---|
| Distinct (%) | 25.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.47093 |
| Minimum | 0.078 |
|---|---|
| Maximum | 2.42 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.8 KiB |
Quantile statistics
| Minimum | 0.078 |
|---|---|
| 5-th percentile | 0.141 |
| Q1 | 0.244 |
| median | 0.376 |
| Q3 | 0.624 |
| 95-th percentile | 1.136 |
| Maximum | 2.42 |
| Range | 2.342 |
| Interquartile range (IQR) | 0.38 |
Descriptive statistics
| Standard deviation | 0.3235525587 |
|---|---|
| Coefficient of variation (CV) | 0.687050217 |
| Kurtosis | 5.006839839 |
| Mean | 0.47093 |
| Median Absolute Deviation (MAD) | 0.168 |
| Skewness | 1.811978894 |
| Sum | 941.86 |
| Variance | 0.1046862582 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.258 | 16 | 0.8% |
| 0.207 | 15 | 0.8% |
| 0.52 | 13 | 0.7% |
| 0.261 | 13 | 0.7% |
| 0.238 | 13 | 0.7% |
| 0.268 | 13 | 0.7% |
| 0.292 | 13 | 0.7% |
| 0.284 | 12 | 0.6% |
| 0.259 | 12 | 0.6% |
| 0.551 | 12 | 0.6% |
| Other values (495) | 1868 |
| Value | Count | Frequency (%) |
| 0.078 | 2 | 0.1% |
| 0.084 | 2 | 0.1% |
| 0.085 | 5 | |
| 0.088 | 6 | |
| 0.089 | 2 | 0.1% |
| 0.092 | 2 | 0.1% |
| 0.096 | 3 | |
| 0.1 | 3 | |
| 0.101 | 2 | 0.1% |
| 0.102 | 2 | 0.1% |
| Value | Count | Frequency (%) |
| 2.42 | 3 | |
| 2.329 | 2 | |
| 2.137 | 3 | |
| 1.893 | 2 | |
| 1.781 | 2 | |
| 1.731 | 3 | |
| 1.699 | 2 | |
| 1.698 | 3 | |
| 1.6 | 3 | |
| 1.476 | 2 |
| Distinct | 52 |
|---|---|
| Distinct (%) | 2.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 33.0905 |
| Minimum | 21 |
|---|---|
| Maximum | 81 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.8 KiB |
Quantile statistics
| Minimum | 21 |
|---|---|
| 5-th percentile | 21 |
| Q1 | 24 |
| median | 29 |
| Q3 | 40 |
| 95-th percentile | 58 |
| Maximum | 81 |
| Range | 60 |
| Interquartile range (IQR) | 16 |
Descriptive statistics
| Standard deviation | 11.78642311 |
|---|---|
| Coefficient of variation (CV) | 0.3561875193 |
| Kurtosis | 0.8263829494 |
| Mean | 33.0905 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | 1.181267223 |
| Sum | 66181 |
| Variance | 138.9197696 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 22 | 192 | 9.6% |
| 21 | 166 | 8.3% |
| 25 | 134 | 6.7% |
| 24 | 122 | 6.1% |
| 23 | 103 | 5.1% |
| 28 | 98 | 4.9% |
| 26 | 84 | 4.2% |
| 27 | 81 | 4.0% |
| 29 | 70 | 3.5% |
| 31 | 58 | 2.9% |
| Other values (42) | 892 |
| Value | Count | Frequency (%) |
| 21 | 166 | |
| 22 | 192 | |
| 23 | 103 | |
| 24 | 122 | |
| 25 | 134 | |
| 26 | 84 | |
| 27 | 81 | |
| 28 | 98 | |
| 29 | 70 | 3.5% |
| 30 | 56 | 2.8% |
| Value | Count | Frequency (%) |
| 81 | 3 | 0.1% |
| 72 | 3 | 0.1% |
| 70 | 3 | 0.1% |
| 69 | 6 | |
| 68 | 3 | 0.1% |
| 67 | 10 | |
| 66 | 12 | |
| 65 | 8 | |
| 64 | 3 | 0.1% |
| 63 | 13 |
Outcome
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 15.8 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 2000 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 0 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1316 | |
| 1 | 684 |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 1316 | |
| 1 | 684 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1316 | |
| 1 | 684 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1316 | |
| 1 | 684 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 2000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 1316 | |
| 1 | 684 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 1316 | |
| 1 | 684 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| Pregnancies | Glucose | BloodPressure | SkinThickness | Insulin | BMI | DiabetesPedigreeFunction | Age | Outcome | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 138 | 62 | 35 | 0 | 33.6 | 0.127 | 47 | 1 |
| 1 | 0 | 84 | 82 | 31 | 125 | 38.2 | 0.233 | 23 | 0 |
| 2 | 0 | 145 | 0 | 0 | 0 | 44.2 | 0.630 | 31 | 1 |
| 3 | 0 | 135 | 68 | 42 | 250 | 42.3 | 0.365 | 24 | 1 |
| 4 | 1 | 139 | 62 | 41 | 480 | 40.7 | 0.536 | 21 | 0 |
| 5 | 0 | 173 | 78 | 32 | 265 | 46.5 | 1.159 | 58 | 0 |
| 6 | 4 | 99 | 72 | 17 | 0 | 25.6 | 0.294 | 28 | 0 |
| 7 | 8 | 194 | 80 | 0 | 0 | 26.1 | 0.551 | 67 | 0 |
| 8 | 2 | 83 | 65 | 28 | 66 | 36.8 | 0.629 | 24 | 0 |
| 9 | 2 | 89 | 90 | 30 | 0 | 33.5 | 0.292 | 42 | 0 |
Last rows
| Pregnancies | Glucose | BloodPressure | SkinThickness | Insulin | BMI | DiabetesPedigreeFunction | Age | Outcome | |
|---|---|---|---|---|---|---|---|---|---|
| 1990 | 3 | 111 | 90 | 12 | 78 | 28.4 | 0.495 | 29 | 0 |
| 1991 | 6 | 102 | 82 | 0 | 0 | 30.8 | 0.180 | 36 | 1 |
| 1992 | 6 | 134 | 70 | 23 | 130 | 35.4 | 0.542 | 29 | 1 |
| 1993 | 2 | 87 | 0 | 23 | 0 | 28.9 | 0.773 | 25 | 0 |
| 1994 | 1 | 79 | 60 | 42 | 48 | 43.5 | 0.678 | 23 | 0 |
| 1995 | 2 | 75 | 64 | 24 | 55 | 29.7 | 0.370 | 33 | 0 |
| 1996 | 8 | 179 | 72 | 42 | 130 | 32.7 | 0.719 | 36 | 1 |
| 1997 | 6 | 85 | 78 | 0 | 0 | 31.2 | 0.382 | 42 | 0 |
| 1998 | 0 | 129 | 110 | 46 | 130 | 67.1 | 0.319 | 26 | 1 |
| 1999 | 2 | 81 | 72 | 15 | 76 | 30.1 | 0.547 | 25 | 0 |
Most frequently occurring
| Pregnancies | Glucose | BloodPressure | SkinThickness | Insulin | BMI | DiabetesPedigreeFunction | Age | Outcome | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|
| 245 | 2 | 81 | 72 | 15 | 76 | 30.1 | 0.547 | 25 | 0 | 6 |
| 99 | 0 | 173 | 78 | 32 | 265 | 46.5 | 1.159 | 58 | 0 | 5 |
| 247 | 2 | 83 | 65 | 28 | 66 | 36.8 | 0.629 | 24 | 0 | 5 |
| 256 | 2 | 89 | 90 | 30 | 0 | 33.5 | 0.292 | 42 | 0 | 5 |
| 345 | 3 | 80 | 0 | 0 | 0 | 0.0 | 0.174 | 22 | 0 | 5 |
| 424 | 4 | 99 | 68 | 38 | 0 | 32.8 | 0.145 | 33 | 0 | 5 |
| 425 | 4 | 99 | 72 | 17 | 0 | 25.6 | 0.294 | 28 | 0 | 5 |
| 444 | 4 | 125 | 70 | 18 | 122 | 28.9 | 1.144 | 45 | 1 | 5 |
| 498 | 5 | 110 | 68 | 0 | 0 | 26.0 | 0.292 | 30 | 0 | 5 |
| 569 | 6 | 154 | 74 | 32 | 193 | 29.3 | 0.839 | 39 | 0 | 5 |